Type lectin fold as a scaffold for massive sequence variation

ABSTRACT

This invention provides a class of binding proteins with a range of binding specificities and affinities based upon variation at select amino acid positions within a scaffold. The variable positions may be readily modified to produce a library of binding proteins with different binding specificities and affinities. The library may be screened to identify one or more as binding a ligand of interest. Compositions comprising the binding proteins, as well as methods of using the binding proteins are also provided.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSOREDRESEARCH AND DEVELOPMENT

The present invention was supported in part with funding provided byGrant No. ______ from the ______. The federal government may havecertain rights to this invention.

FIELD OF THE INVENTION

This invention relates to a class of binding proteins with a range ofbinding specificities and affinities based upon variation at selectamino acid positions within a scaffold. The variable positions may bereadily modified to produce a variety of binding proteins with differentbinding specificities and affinities. This range of proteins may bescreened to identify one or more as binding a target molecule ofinterest. Compositions comprising the binding proteins, as well asmethods of using the binding proteins are also provided.

BACKGROUND OF THE INVENTION

The amino acid sequence of a protein determines its secondary, tertiary,and quaternary structure to result in the protein's finalthree-dimensional (3D) shape. The shape and functional groups (sidechains) of the amino acids therein define the protein's function. In thecase of a binding protein, the portion of the protein responsible forthe binding activity (binding domain) must either be exposed, or becapable of being exposed, on an accessible surface of the proteinexposed to the exterior solvent to provide for possible interaction witha binding target. Thus to vary the binding activity, the amino acidresidues of the binding domain must be varied.

With an immunoglobulin as an example of a familiar binding protein withspecificity and affinity, the “variable region” or binding domainincludes six loops clustered in space. The loops provide the 6complementarity determining regions (CDRS) and are contained in twopolypeptides, a heavy chain and a light chain, each carrying 3 CDRs (H1,H2, and H3 of the heavy chain and L1, L2, and L3 of the light chain).The amino acid residues of the variable regions orient the CDRs towardthe exterior solvent environment to permit their interaction with anantigen. High sequence variability of the amino acid residues of theCDRs allows immunoglobulins as a class to bind a large variety ofantigens. The CDRs and non-CDR portion of the variable region form animmunoglobulin fold to determine the structure of the loops and therebymaintain the overall structure of the immunoglobulin variable region,with proper orientation of the CDRs.

But variability in the sequence of a protein, like an immunoglobulin, isoften limited by the effects of variability on protein folding and theresulting final 3D shape. Amino acid residues with side chains that arenot exposed to the exterior solvent are often limited in variabilitybecause as part of the protein's interior they must “fit” within theinterior space as dictated by other amino acid residues. The protein cantolerate greater variability in residues with side chains orientedtoward, and exposed to, the exterior solvent, given that they do nothave to “fit” into an interior space constrained by other residues.

To diversify the binding functionality of a binding protein and thuspromote recognition of members of a diverse population of targetmolecules, amino acid variability is necessary. Interactions between abinding protein and its target molecule (the ligand) are usuallynon-covalent and yet often very tight (high affinity or avidity) andspecific. The intermolecular interactions are defined by the amino acidresidues of the protein's binding domain which form a surface that fits“hand-in-glove” like onto the surface of the ligand being bound. The twocontacting surfaces must have complementarity via hydrogen bonding (attimes mediated by a water molecule), charge interactions, alignment ofattracting dipoles, hydrophobic to hydrophobic (van der Waals)interactions, and/or protrusions fitting with depressions.

In the example of an immunoglobulin, the binding domain is presentedwithin the context of the framework made up by the rest of theimmunoglobulin molecule. The framework, generally referred to as theimmunoglobulin fold, forms the scaffold of the protein structure andfunctions to correctly present the binding domain. The frameworkrestrains the 3D shape of the protein so that the amino acid residues ofthe binding domain are positioned in a manner to create the accessiblespecific binding site.

The usefulness of immunoglobulins as manipulable binding proteins islimited, however, by the nature of the immunoglobulin framework, whichrequires two polypeptides to form the complete ligand- orantigen-binding site. This results in a number of disadvantages: theneed to manipulate rather large polypeptides, the need for complicatedmolecular cloning to diversify a binding site; and the complication ofmodifying six different CDRs. The consequences of these disadvantagesinclude constraints on using phage display (see for example U.S. Pat.Nos. 5,223,409 and 5,571,698) to diversify immunoglobulins for thepurpose of creating new binding or other functional activities.

A number of attempts have been made to overcome the limitations ofimmunoglobulins. These include the use of a CTL4-like sandwicharchitecture as a framework for presenting randomized peptide sequences(see WO 00/60070); the use of fibronectin type III domains (see U.S.Pat. No. 6,818,418); the use of an “anticalin” (see WO 99/16873 andBeste et al. Proc. Natl. Acad. Sci., USA 96:1898-1903 (1999)); and eventhe use of single chain antibodies, optionally with a CH3 domain of animmunoglobulin to permit spontaneous dimerization.

Citation of documents herein is not intended as an admission that any ispertinent prior art. All statements as to the date or representation asto the contents of documents is based on the information available tothe applicant and does not constitute any admission as to thecorrectness of the dates or contents of the documents.

BRIEF SUMMARY OF THE INVENTION

The present invention is related to the discovery of adiversity-generating retroelement (DGR) belonging to a Bordetellabacteriophage. The DGR has recently been shown to be capable ofproducing massive, targeted amino acid sequence variation in the phage'sreceptor-binding protein, the major tropism determinant (Mtd). See Liu,M. et al. “Reverse transcriptase-mediated tropism switching inBordetella bacteriophage.” Science 295, 2091-4 (2002); Liu, M. et al.“Genomic and genetic analysis of Bordetella bacteriophages encodingreverse transcriptase-mediated tropism-switching cassettes.” J Bacteriol186, 1503-17 (2004); and Doulatov, S. et al. “Tropism switching inBordetella bacteriophage defines a family of diversity-generatingretroelements.” Nature 431, 476-81 (2004). This genetically programmeddiversity, with ˜10¹³ different Mtd sequences possible, is rivaled inscale only by antibodies (immunoglobulins) and T cell receptors in theimmune system (see Davis, M. M. & Bjorkman, P. J. “T-cell antigenreceptor genes and T-cell recognition.” Nature 334, 395-402 (1988)).

As noted above, whereas the immune system requires variability innumerous gene segments to achieve antigen-binding diversity, theBordetella phage DGR utilizes a single copy of mtd followed by a nearlyidentical (90%), 134-bp direct repeat of the 3′ end of mtd (see FIG. 1herein). Genetic information in this direct repeat, called the templaterepeat (TR) due to its invariance, is converted into a cDNA altered byrandom insertion of A, G, C, or T specifically at sites occupied byadenines in TR through the action of a DGR-encoded reversetranscriptase. The mutagenized sequence is then substituted into thevariable region (VR) of mtd by a process known as mutagenic homing,thereby producing an Mtd variant. Due to the adenine dependency of themutagenic process mediated by the DGR reverse transcriptase, 12 aminoacid residues in VR, encoded by codons corresponding to nucleotidetriplets in TR with adenine residues at non-wobble positions, aresubject to variation at high frequency. The effect of the resultingamino acid variation in VR is to alter the binding specificity of Mtdand consequently host tropism for the phage. These alterations arecrucial to the phage's survival because its host, Bordetella, undergoesphase variation under different environmental conditions, and theexpression patterns of bacterial cell surface receptors, such aspertactin change with the phase. For example, Bvg-plus tropic phage-1(BPP-1) infects only Bvg⁺ Bordetella, the pathogenic phase, since theMtd-P1 variant expressed by this phage uses as its receptor theBvg⁺-specific outer membrane protein, pertactin. When Bordetellaencounters an ex vivo environment, it ceases expressing pertactin,becoming Bvg⁻ as it concomitantly becomes resistant to infection byBPP-1 (see Uhl, M. A. & Miller, J. F. “Integration of multiple domainsin a two-component sensor protein: the Bordetella pertussis BvgASphosphorelay.” EMBO J 15, 1028-36 (1996)).

However, the phage counters by producing Mtd variants, such as Mtd-M1,that use unknown receptors expressed exclusively by Bvg⁻ Bordetella,thereby creating Bvg-minus tropic phage (BMP). Alternatively, Mtdvariants, such as Mtd-I1, are produced that infect through unknownreceptors expressed by both phases of Bordetella, thereby creatingBvg-indiscriminant phage (BIP). Mtd variants, such as Mtd-3c, thatconfer infectivity towards Bvg⁺ Bordetella but use instead of pertactin,an unknown receptor, have also been found. The molecular proteinstructure with which Mtd creates diverse receptor-binding sites andtolerates massive sequence variation was not known prior to the presentinvention.

Mtd is found on the tails of Bordetella bacteriophage, which number 6per phage particle. Based upon the discovery described herein, thereappear to be 2 Mtd trimers per phage tail, and thereby 12 Mtd trimersper phage particle.

The invention is based in part on the discovery of the unexpectedstructures of multiple Mtd variants. The basic structure is apyramid-shaped homotrimer with variable amino acid residues organizedalong the pyramid base by a C-type lectin (CTL)-fold that creates adiscrete receptor-binding site in each of the three monomers. Thepresent invention thus provides the use of the CTL-fold, or portionthereof, as a scaffold to orient the side chains of variable amino acidresidues toward the external solvent environment. The side chains of thevariable amino acid residues define, in whole or in part, the threedimensional structure or shape of all or part of the binding site, whichis attached to the scaffold through the alpha carbons of each variableamino acid residue.

The present invention also provides for the use of CTL-folds as ascaffold for massive sequence variation of the variable amino acidresidues, and thus the side chains thereof, in the manner exemplified byBordetella bacteriophage. The availability of ˜10¹³ possiblecombinations of variable amino acid residue side chains in the bindingsite provides a highly diverse population of binding proteins withdifferent specificities. The extraordinary diversity available in thislocalized portion of the binding site provided by the scaffold providesdiffering shapes and chemical reactivities suitable for binding to andoperating on a wide range of target molecules. This level of diversityprovided to the binding site of a CTL-fold by the present invention isparalleled only by the antigen binding region of immunoglobulins and Tcell receptors in the immune system. But unlike those examples, thebinding proteins of the invention may be produced by modification of asingle polypeptide chain to result in a highly diverse population ofbinding proteins. The single chain can be modified via recombinantmethods, such as by recombinant use of the elements of the DGR ofBordetella bacteriophage.

The scaffold, or backbone conformation, present in the CTL-fold has beenobserved to provide a stable structure for the presentation of a bindingsite. As noted by Kogelberg et al. (Curr. Opin. Structural Biol.,11:635-643, 2001), the CTL-fold has closely spaced N and C termini whichare opposite the binding site of the fold. Thus the invention providesfor the use of the CTL-fold to present a binding site with variableresidues that may be varied without compromising the maintenance of thestructural integrity of the CTL-fold. In the case of Mtd, the scaffoldstructure includes stabilization of loops in the binding site by twoinserts and trimeric intertwining as well as other structurescontributing to the CTL fold. In the case of other CTL-folds, thescaffold is similarly stabilized by the structures present in thescaffold, such as, but not limited to, the presence of disulfide bridgesthat contribute to the integrity of the CTL fold. The CTL-fold,therefore, provides a stable, highly tolerant scaffold for combinatorialdisplay of the side chains of variable amino acid residues used to formall or part of a binding site.

The availability of a scaffold to present diverse binding sites permitsthe generation of binding proteins with different specificities andaffinities for binding a wide number of different target molecules,particularly biomolecules. The binding proteins may be used to bind, andthus detect, identify, localize or modify, such target molecules.

The invention thus provides, in one aspect, for a protein scaffoldcomprising a variable binding site comprising the amino acid sequence(SEQ ID NO:1) -Xaa₁-Trp-Xaa₂-Xaa₃-Xaa₄-Ser-Xaa₅-Ser-Gly-Ser-Arg-Ala-Ala-Xaa₆-Trp-Xaa₇-Xaa₈-Gly-Pro-Ser-Xaa₉-Ser- Xaa₁₀-Ala-Xaa₁₁-Xaa₁₂-

wherein each of Xaa₁ to Xaa₁₂ is independently any amino acid residue,the side chains of which form a binding site, in whole or in part.

The scaffold serves as a framework to present variable amino acidresidues, the side chains of which form the binding site of the protein.Preferably, the scaffold is derived from, and forms all or part of, aCTL-fold which displays or exposes the binding site to the externalsolvent environment. Thus the invention includes the above sequence(wherein SEQ ID NO:1 constitutes all or part of the binding side of thescaffold) in a non-Mtd, CTL-fold as the scaffold. The scaffold mayoptionally be conjugated to another polypeptide or other moleculethrough residues distant from the binding site.

In another aspect, the invention also provides a binding proteincomprising a scaffold as described above. The binding specificity of theprotein is determined by the variable binding site, and the proteincomprises a scaffold comprising the amino acid sequence (SEQ ID NO:1)-Xaa₁-Trp-Xaa₂-Xaa₃-Xaa₄-Ser-Xaa₅-Ser-Gly-Ser-Arg-Ala-Ala-Xaa₆-Trp-Xaa₇-Xaa₈-Gly-Pro-Ser-Xaa₉-Ser- Xaa₁₀-Ala-Xaa₁₁-Xaa₁₂-

wherein each of Xaa₁ to Xaa₁₂ is independently any amino acid residue,the side chains of which form a binding site, in whole or in part, thatdetermines the binding specificity of the protein; and

at each of the Xaa₁ and Xaa₁₂ ends of the scaffold are amino acidsequences that form a superscaffold which displays said binding site ina solvent exposed portion of the protein, or one of the Xaa₁ and Xaa₁₂ends of the scaffold is —H (a covalently bonded hydrogen atom) and theother end is an amino acid sequence that forms a superscaffold whichdisplays said binding site in a solvent exposed portion of the protein.

The side chains of the variable (Xaa) residues may form the whole of thebinding site where no other side chains of the protein contribute tobinding interactions with a target molecule bound by the protein.Alternatively, other side chains of the protein, such as those of otheramino acid residues in the scaffold or superscaffold, may contribute tothe binding interactions with a target molecule. In this case, the sidechains of the variable residues only compose part of the binding site ofthe protein. Non-limiting examples of a target molecule include a viralantigen, a bacterial antigen, a fungal antigen, an enzyme, an enzymeinhibitor, a cell surface molecule of any composition, a reportermolecule, a serum protein, and a receptor. In the case of a viralantigen as a target molecule, it may be, but is not limited to, apolypeptide required for replication. Thus the binding sites of theinvention, like immunoglobulin binding sites, recognize proteins(including native, denatured, and proteolytic forms thereof as well asconformational determinants thereof); nucleic acids; polysaccharides(alone or as modifications on another molecule, such as a protein);lipids; and small chemical molecules (like haptens in the case of anantibody).

Optionally, the scaffold is extended at the Xaa₁ end by all or part ofthe sequence -Ala-Ala-Leu-Phe-Gly-Gly- (SEQ ID NO:2), wherein theextension may be by 1, 2, 3, 4, 5, or all 6 of the consecutive aminoacid residues of SEQ ID NO:2 linked to Xaa₁ via the carboxyl end of thelast Gly residue in SEQ ID NO:2. Alternatively, the scaffold is extendedat the Xaa₁₂ end by all or part of the sequence-Gly-Ala-Arg-Gly-Val-Cys-Asp-His-Leu-Ile-Leu-Glu (SEQ ID NO:3), whereinthe extension may be by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or all 12 ofthe consecutive amino acid residues linked to Xaa₁₂ via the amino end ofthe first Gly residue in SEQ ID NO:3. The scaffold may also be extendedat both ends by any combination of the above extensions at Xaa₁ andXaa₁₂ followed by further optional extensions. Where all 12 amino acidsof SEQ ID NO:3 are present in a scaffold, preferred embodiments of theinvention have no further extension at the C terminus by additionalamino acid residues.

The superscaffold is composed of additional amino acids attached to ascaffold of the invention without adverse effect on the binding sitecontained therein. A binding protein of the invention is thus preferablycomposed of a binding site within a scaffold which is attached to asuperscaffold. Preferably, the superscaffold is composed of amino acidsassociated with the scaffold in naturally occurring sources of thescaffold, such as in naturally occurring polypeptides with a CTL-fold.Alternatively, the scaffold may be grafted onto a heterologoussuperscaffold, such as the superscaffold of another CTL-fold containingpolypeptide, analogous to the grafting of mouse antibody CDRs onto ahuman antibody framework. Amino acid residues of the superscaffold mayalso serve to permit conjugation of the binding protein to anothermolecule. Thus the superscaffold may be a polypeptide linker as anon-limiting example. The polypeptide linker may be of differing lengthsand compositions.

The superscaffold may also optionally constitute or comprise adimerization or multimerization domain which permits organization ofmore than one scaffold in three dimensional space without covalentlinkage, or optionally through one or more disulfide bonds in additionto non-covalent interactions. Alternatively, the superscaffold may be alinker molecule or linker polypeptide which covalently links a scaffoldto another molecule, such as a second scaffold, which may be the same ordifferent from the first scaffold. Additionally, the superscaffold maycomprise a transmembrane region or domain capable of tethering thescaffold in a lipid bilayer, such as at a cell surface. Further still,the superscaffold may be another protein molecule to form a fusionprotein comprising a scaffold of the invention.

A further aspect of the invention provides additional scaffolds andbinding proteins comprising them. Generally, the scaffold is a CTL-foldcontaining a region with one or more variable residues, which regionstarts at the end of the β3 strand (or with the last residue thereof)and continues through any intervening secondary structures until, butpreferably not including, the non-solvent exposed residues of, or beforethe start of, the β5 strand. Thus the scaffold may comprise a variableregion represented by the sequence

-Xaa₁-Trp-Xaa₂-Xaa₃-Xaa₄-Xaa₅-Xaa₆-Ser-Xaa₇-Xaa₈-Arg-Xaa₉-Xaa₁₀-Xaa₁₁-Xaa₁₂-Xaa₁₃-Xaa₁₄-Xaa₁₅-Xaa₁₆-Xaa₁₇-Xaa₁₈-Xaa₁₉-Xaa₂₀-Xaa₂₁-Xaa₂₂-Xaa₂₃-(SEQ ID NO:4) wherein each Xaa is independently any amino acid residuebut wherein Xaa₅ is preferably Ser, Ala, or Pro, or a conservativesubstitution of any of these three residues; or Xaa₇ is preferably Gly,Ala, or Leu, or a conservative substitution of any of these threeresidues; and/or Xaa₈ is preferably Ser, Tyr, Phe, or Trp, or aconservative substitution of any of these four residues; or

SEQ ID NO:4 wherein Xaa₅ is Ser or wherein Xaa₇ is Gly or wherein Xaa₈is Ser or wherein Xaa₉ is Ala or wherein Xaa₁₀ is Ala or wherein Xaa₁₂is Trp or wherein Xaa₁₅ is Gly or wherein Xaa₁₆ is Pro or wherein Xaa₁₇is Ser or wherein Xaa₁₉ is Ser or wherein Xaa₂₁ is Ala or anycombination of the foregoing for Xaa₅, Xaa₇, Xaa₈, Xaa₉, Xaa₁₀, Xaa₁₂,Xaa₁₅, Xaa₁₆, Xaa₁₇, Xaa₁₉, and Xaa₂₁. The side chains of the Xaaresidues in the above sequences form a binding site, in whole or inpart. At each of the N and C terminal ends of the sequences are optionalamino acid sequences, or one of the ends is —H (a covalently bondedhydrogen atom), such as those that form a CTL-fold containing thebinding site displayed in a solvent exposed portion of the fold.

At the N terminus, these sequences are optionally extended by all orpart of SEQ ID NO:2, wherein the extension may be by 1, 2, 3, 4, 5, orall 6 of the consecutive amino acid residues therein linked to Xaa₁ viathe carboxyl end of the last Gly residue in SEQ ID NO:2. At theC-terminus, these sequences are also optionally extended by all or partof SEQ ID NO: 3, wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, or all 12 of the consecutive amino acid residues linked tothe C terminal Xaa via the amino end of the first Gly residue in SEQ IDNO:3. The sequences may also be extended at both ends by any combinationof the above extensions at Xaa₁ and Xaa₂₃ followed by further optionalextensions. Where all 12 amino acids of SEQ ID NO:3 are present,preferred embodiments of the invention have no further extension at theC terminus.

SEQ ID NO:4 containing sequences are preferably part of a scaffold asfound in the CTL-fold portion of Mtd. Alternatively, the sequences maybe substituted for the corresponding sequence between the β3 and β5strands of another CTL-fold as described herein.

Alternatively, the scaffold may comprise a cyanobacterium derivedvariable region represented by

-Xaa₁-Trp-Xaa₂-Xaa₃-Xaa₄-Xaa₅-Xaa₆-Xaa₇-Cys-Arg-Ser-Xaa₈-Xaa₉-Arg-Xaa₁₀-Xaa₁₁-Xaa₁₂-Xaa₁₃-Xaa₁₄-Xaa₁₅-Xaa₁₆-Xaa₁₇-Xaa₁₈-Xaa₁₉-Xaa₂₀-Xaa₂₁-(SEQ ID NO:5), optionally with the addition of -Xaa₂₂-, or Xaa₂₂-Xaa₂₃-,or -Xaa₂₂-Xaa₂₃-Xaa₂₄- at the C terminus end, wherein each Xaa isindependently any amino acid residue but wherein Xaa₅ is preferably Ser,Ala, or Pro, or a conservative substitution of any of these threeresidues; or Xaa₈ is Gly or Ala, or Leu, or a conservative substitutionof any of these three residues; and/or Xaa₉ is Ser, Tyr, Phe, or Trp, ora conservative substitution of any of these four residues. Again, theside chains of the Xaa residues in the above sequence form a bindingsite, in whole or in part. At each of the N and C terminal ends of thesequences are optional amino acid sequences, or one of the ends is —H (acovalently bonded hydrogen atom), such as those that form a CTL-foldcontaining the binding site displayed in a solvent exposed portion ofthe fold.

At the N terminus, these sequences are optionally extended by all orpart of SEQ ID NO:2, wherein the extension may be by 1, 2, 3, 4, 5, orall 6 of the consecutive amino acid residues therein linked to Xaa₁ viathe carboxyl end of the last Gly residue in SEQ ID NO:2. At theC-terminus, these sequences are also optionally extended by all or partof SEQ ID NO: 3, wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, or all 12 of the consecutive amino acid residues linked tothe C terminal Xaa via the amino end of the first Gly residue in SEQ IDNO:3. Alternatively, the sequence is extended at the C terminus by allor part of -Gly-Phe-Arg-Leu-Val-Ser-Phe-Pro-Pro-Arg-Thr-Leu-Glu- (SEQ IDNO:6), -Gly-Phe-Arg-Leu-Val-Ser-Phe-Pro-Pro-Arg-Thr-Pro-Glu- (SEQ IDNO:7), -Gly-Phe-Arg-Val-Val-Cys-Ala-Phe-Gly-Arg-Ile-Leu-Gln- (SEQ IDNO:8), or -Gly-Phe-Arg-Val-Val-Cys-Ala-Phe-Gly-Arg-Thr-Phe-Gln- (SEQ IDNO:9), wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12 or all 13 of the consecutive amino acid residues in any one ofSEQ ID NOs:6-9 linked to the C terminal Xaa via the amino end of thefirst Gly residue in each SEQ ID NO. The C terminus extension may alsobe by -Gly-Phe-Arg-Val-Ile-Ser-Ser-Ser-Pro-Val-Val-Ser-Gly-Phe-His-Ser-(SEQ ID NO:10), wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, or all 16 of the consecutive amino acidresidues linked to the C terminal Xaa via the amino end of the first Glyresidue in SEQ ID NO:10; or by-Gly-Cys-Arg-Val-Val-Val-Val-Arg-Gly-Arg-Leu-Ser- (SEQ ID NO:11),wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, orall 12 of the consecutive amino acid residues linked to the C terminalXaa via the amino end of the first Gly residue in SEQ ID NO:11.

The sequences may also be extended at both ends by any combination ofthe above extensions at Xaa₁ and Xaa₂₁ (or Xaa₂₂, Xaa₂₃, or Xaa₂₄)followed by further optional extensions. Where all the amino acids ofany of SEQ ID NOs:3 or 6-11 are present, preferred embodiments of theinvention have no further extension at the C terminus.

SEQ ID NO:5 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a protein containing a cyanobacterium aminoacid sequence as shown in FIG. 5. Those cyanobacterium CTL-foldcontaining proteins are from Trichodesmium erythraeum (preferably T.e.1A, T.e. 1B, or T.e. 2); Nostoc PPC ssp. 7120 (preferably N. PCC. 1, N.PCC. 2A, or N. PCC. 2B); or Nostoc punctiforme (preferably N.p. 1 orN.p. 2) and have both protein level homology as well (as indicated inFIG. 5) and genetic similarity because the coding regions for theproteins contain a corresponding TR. Alternatively, the sequences may besubstituted for the corresponding sequence between the β3 and β5 strandsof another CTL-fold as described herein.

The invention also provides a Treponema denticola derived variableregion comprising a sequence represented by (SEQ ID NO:12)-Xaa₁Arg-Val-Xaa₂-Arg-Gly-Gly-Xaa₃-Trp-Xaa₄-Xaa₅-Xaa₆-Ala-Xaa₇-Xaa₈-Cys-Xaa₉-Val-Gly-Xaa₁₀-Arg-Xaa₁₁-Xaa₁₂-Xaa₁₃-Xaa₁₄-Pro-Xaa₁₅-Xaa₁₆-Xaa₁₇- Xaa₁₈-Xaa₁₉-Xaa₂₀-Leu-,

wherein each Xaa is independently any amino acid residue and the sidechains of the Xaa residues in the above sequence form a binding site, inwhole or in part. At each of the N and C terminal ends of the sequencesare optional amino acid sequences, or one of the ends is —H (acovalently bonded hydrogen atom), such as those that form a CTL-foldcontaining the binding site displayed in a solvent exposed portion ofthe fold.

The sequence is optionally extended at the C terminus Leu by one or moreresidues in -Gly-Phe-Arg-Leu-Ala-Cys-Arg-Pro (SEQ ID NO:13) wherein theextension may be by 1, 2, 3, 4, 5, 6, 7, or all 8 of the consecutiveamino acid residues linked to the C terminal Leu via the amino end ofthe first Gly residue in SEQ ID NO:13. Where all 8 amino acids of SEQ IDNO:13 are present, preferred embodiments of the invention have nofurther extension at the C terminus.

SEQ ID NO:12 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a Treponema denticola protein containing thecorresponding T.d. amino acid sequence in FIG. 5. Alternatively, thesequences may be substituted for the corresponding sequence between theβ3 and β5 strands of another CTL-fold as described herein.

The invention further provides a scaffold comprising another phagederived variable region represented by (SEQ ID NO:14)-Gly-Gly-Gly-Leu-Trp-Cys-Arg-Asn-Tyr-Gly-Asp-Arg-Phe-Pro-Ile-Arg-Gly-Gly-Xaa₁Trp-Xaa₂-Xaa₃-Gly-Ser-Xaa₄-Ala-Gly-Leu-Gly-Ala-Leu-Xaa₅-Leu-Xaa₆-Xaa₇-Ala-Arg-Ser-Xaa₈-Ser-Xaa₉-Xaa₁₀-Xaa₁₁-Xaa₁₂-

wherein each Xaa is independently any amino acid residue and the sidechains of the Xaa residues in the above sequence form a binding site, inwhole or in part. At each of the N and C terminal ends of the sequencesare optional amino acid sequences, or one of the ends is —H (acovalently bonded hydrogen atom), such as those that form a CTL-foldcontaining the binding site displayed in a solvent exposed portion ofthe fold.

The sequence is optionally extended at the Xaa₁₂ end by one or moreresidues in -Gly-Phe-Arg-Pro-Ala-Phe-Phe-Val (SEQ ID NO:15) wherein theextension may be by 1, 2, 3, 4, 5, 6, 7, or all 8 of the consecutiveamino acid residues linked to Xaa₁₂ via the amino end of the first Glyresidue in SEQ ID NO:15. Where all 8 amino acids of SEQ ID NO:15 arepresent, preferred embodiments of the invention have no furtherextension at the C terminus.

SEQ ID NO:14 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a Vibrio harveyi ML phage protein (ORF35encoded protein) containing the corresponding V.h. ML amino acidsequence in FIG. 5. Alternatively, the sequences may be substituted forthe corresponding sequence between the β3 and β5 strands of anotherCTL-fold as described herein.

The invention also provides a scaffold comprising a Bifidobacteriumlongum derived variable region represented by (SEQ ID NO:16)-Xaa₁Arg-Phe-Gly-Xaa₂-Leu-Xaa₃-Xaa₄-Gly-Ala-Ala-Cys-Gly-Ala-Phe-Ala-Val-Xaa₅-Leu-Xaa₆-Xaa₇-Xaa₈-Leu-Ala-Xaa₉-Arg-Xaa₁₀-Trp-Xaa₁₁-Xaa₁₂-

wherein each Xaa is independently any amino acid residue and the sidechains of the Xaa residues in the above sequence form a binding site, inwhole or in part. At each of the N and C terminal ends of the sequencesare optional amino acid sequences, or one of the ends is —H (acovalently bonded hydrogen atom), such as those that form a CTL-foldcontaining the binding site displayed in a solvent exposed portion ofthe fold.

The sequence is optionally extended at the Xaa₁₂ end by one or moreresidues in -Gly-Gly-Arg-Leu-Ser-Ala-Leu-Gly-Arg-Thr-Lys-Ala (SEQ IDNO:17) wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, or all 12 of the consecutive amino acid residues linked to Xaa₁₂ viathe amino end of the first Gly residue in SEQ ID NO:17. Where all 12amino acids of SEQ ID NO:17 are present, preferred embodiments of theinvention have no further extension at the C terminus.

SEQ ID NO:16 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a Bifidobacterium longum protein containing thecorresponding B.l. amino acid sequence in FIG. 5. Alternatively, thesequences may be substituted for the corresponding sequence between theβ3 and β5 strands of another CTL-fold as described herein.

Additionally, the invention also provides a scaffold comprising aBacteroides thetaiotaonicron derived variable region represented by (SEQID NO:18) -Xaa₁Gly-Xaa₂-Cys-Trp-Ser-Ala-Val-Pro-Xaa₃-Xaa₄-Xaa₅-Xaa₆-Xaa₇-Gly-Xaa₈-Xaa₉-Leu-Xaa₁₀-Phe-Xaa₁₁Ser-Ser-Xaa₁₂-Val-Xaa₁₃-Pro-Leu-Xaa₁₄-Xaa₁₅-Xaa₁₆- Xaa₁₇-

wherein each Xaa is independently any amino acid residue and the sidechains of the Xaa residues in the above sequence form a binding site, inwhole or in part. At each of the N and C terminal ends of the sequencesare optional amino acid sequences, or one of the ends is —H (acovalently bonded hydrogen atom), such as those that form a CTL-foldcontaining the binding site displayed in a solvent exposed portion ofthe fold.

The sequence is optionally extended at the Xaa₁₇ end by one or moreresidues in -Arg-Ala-Cys-Gly-Phe-Gly-Leu-Arg-Ser-Ser-Gln-Glu (SEQ IDNO:19) wherein the extension may be by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, or all 12 of the consecutive amino acid residues linked to Xaa₁₇ viathe amino end of the first Arg residue in SEQ ID NO:19. Where all 12amino acids of SEQ ID NO:19 are present, preferred embodiments of theinvention have no further extension at the C terminus.

SEQ ID NO:18 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a Bacteroides thetaiotaonicron proteincontaining the corresponding B.t. amino acid sequence in FIG. 5.Alternatively, the sequences may be substituted for the correspondingsequence between the β3 and β5 strands of another CTL-fold as describedherein.

Additionally, the invention provides for the use of the region betweenthe β3 and β5 strands of a CTL-fold as a variable region in which aminoacids may be altered to produce novel binding sites with differentspecificities and avidities. Thus in an additional aspect of theinvention, the nucleic acid sequence encoding the CTL-fold of a CTL-foldcontaining protein may be operably linked to a template region (TR), andan IMH as needed, wherein the TR corresponds to all or part of thebinding site in the CTL-fold and contains adenine residues that directchanges in the amino acid sequence of the binding site, and thusvariable region, as described herein. Preferred embodiments of theinvention include CTL-fold encoding nucleic acids with the Mtd IMEl, ora functional fragment thereof, to direct alterations in the VR based onadenine residues in the functionally linked TR.

A scaffold in a binding protein of the invention is preferably all orpart of a CTL-fold that correctly orients the binding site containedtherein. Non-limiting examples of CTL-folds include that in Mtd asdescribed herein as well those classified as C-type lectin-like domains(CTLDs) and divergent CTLDs. Preferred regions of the CTL-fold in Mtdare residues 171-381 and residues 306-381 of SEQ ID NO:20. In the caseof residues 171-381, the size is analogous to recombinant single chainantibodies composed of a single variable domain (VHH), which remains astable polypeptide with the antigen binding capability of the originalvariable region of the heavy chain (see Nanobodies™ by Ablynx). TheseVHH are based on antibodies that lack light chains found in camelidae(camels and llamas). In the case of residues 306-381, at least oneregion composed of residues 171-199, residues 237-263, residues 200-236,or residues 264-305 is preferably present in the fold as well.Particularly preferred is the presence of any two, any three, or allfour of these regions.

CTLD examples include those that bind Ca²⁺, such as carbohydraterecognition domains (CRDs), C-type lectin domains (which bind sugars),coagulation factor binding proteins, and IgE Fc receptor. Divergent CTLDexamples include type II antifreeze proteins, oxidized LDL receptor,phospholipase receptors, NK cell receptors (which bind MHC ligands).Other non-limiting examples include link protein modules, endostatin,and intimin. For a review of the C-type lectin fold, see Drickamer, K.“C-type lectin-like domains.” Curr Opin Struct Biol 9, 585-90 (1999).

Preferably, the CTL-fold is bacterial (including bacterial phages),human or mammalian in origin. Non-limiting examples include theselectins (see Lasky (1995) Annu. Rev. Biochem., 64:113-139), includingE-selectin, L-selectin and P-selectin; mannose binding protein (MBP),including MBP-A and MBP-C; the natural killer (NK) receptor NKG2D; CD69;eosinophilic major basic protein (EMBP); tumour necrosisfactor-stimulated gene-6 product (TSG-6); enteropathogenic E. coli(EPEC) intimin (the D3 domain therein is a CTL-fold); and Yersiniapseudotuberculosis invasin (the D5 domain is a CTL-fold).

An MBP derived variable region of the invention is represented by

-Xaa₁-Xaa₂-Gly-Xaa₃-Trp-Asn-Asp-Xaa₄-Xaa₅-Cys-Xaa₆-Xaa₇-Xaa₈- (SEQ IDNO:21) wherein each Xaa is independently any amino acid residue; or

SEQ ID NO:21 wherein Xaa₁ is Asp or wherein Xaa₂ is Asn or wherein Xaa₃is Leu, Gln, His, or Lys or wherein Xaa₄ is Ile, Val, or Asp or whereinXaa₅ is Ser, Pro, Val, or Ala or wherein Xaa₆ is Gln, Asn, Arg, or Hisor wherein Xaa₇ is Ala, Tyr, Arg, or Lys or wherein Xaa₈ is Ser, Gln,Pro, or Arg or any combination of the foregoing for Xaa₁ to Xaa₈.

The side chains of the Xaa residues in the above sequences form abinding site, in whole or in part. At each of the N and C terminal endsof the sequences are optional amino acid sequences, or one of the endsis —H (a covalently bonded hydrogen atom), such as those that form aCTL-fold containing the binding site displayed in a solvent exposedportion of the fold.

SEQ ID NO:21 containing sequences are preferably part of a scaffold asfound in the CTL-fold of an MBP protein, preferably with a collagenousdomain. Alternatively, the sequences may be substituted for thecorresponding sequence between the β3 and β5 strands of another CTL-foldas described herein.

A selectin derived variable region of the invention is represented by

-Xaa₁-Xaa₂-Xaa₃-Xaa₄-Xaa₅-Xaa₆-Xaa₇-Gly-Xaa₈-Trp-Asn-Asp-Xaa₉-Xaa₁₀-Cys-Xaa₁₁-Xaa₁₂-Xaa₁₃-(SEQ ID NO:22) wherein each Xaa is independently any amino acid residue;or

SEQ ID NO:22 wherein Xaa₁ is Ile or wherein Xaa₂ is Lys or wherein Xaa₃is Arg or wherein Xaa₄ is Gln or wherein Xaa₅ is Arg or wherein Xaa₆ isAsp or wherein Xaa₇ is Ser or wherein Xaa₈ is Leu, Gln, His, or Lys orwherein Xaa₉ is Ile, Val, or Asp or wherein Xaa₁₀ is Ser, Pro, Val, orAla or wherein Xaa₁₁ is Gln, Asn, Arg, or His or wherein Xaa₁₂ is Ala,Tyr, Arg, or Lys or wherein Xaa₁₃ is Ser, Gln, Pro, or Arg or anycombination of the foregoing for Xaa₁ to Xaa₁₃.

The side chains of the Xaa residues in the above sequences form abinding site, in whole or in part. At each of the N and C terminal endsof the sequences are optional amino acid sequences, or one of the endsis —H (a covalently bonded hydrogen atom), such as those that form aCTL-fold containing the binding site displayed in a solvent exposedportion of the fold.

SEQ ID NO:22 containing sequences are preferably part of a scaffold asfound in the CTL-fold of a selectin protein. Alternatively, thesequences may be substituted for the corresponding sequence between theβ3 and β5 strands of another CTL-fold as described herein.

In a further aspect, the invention provides nucleic acid molecules, orpolynucleotides, encoding the scaffolds and binding proteins asdescribed herein. The nucleic acids or polynucleotides may be part of anucleic acid vector or plasmid, optionally in a cell, preferablysuitable for expression of the encoded protein. The scaffold ispreferably all or part of a variable region (VR) in the nucleic acidmolecule which is operably linked to an initiation of mutagenic homing(IMH) sequence and a template region (TR) as described below. Thusnucleic acid molecules encoding the CTL-folds described above, but whichdo not have an operably linked IMH and/or TR components, may be modifiedto be a nucleic acid molecule of the invention by attachment of thenecessary functional nucleic acid components.

The invention also provides a plurality, or library, of scaffolds orbinding proteins as well as methods for their production. Thus, a methodof producing a plurality of scaffolds or proteins with different bindingspecificities is disclosed, the method comprising expressing andreplicating a nucleic acid molecule or polypeptide encoding a scaffoldor binding protein of the invention in a cell under conditions ofmutagenic homing wherein said TR directs mutagenesis of variableresidues within the variable region (VR) containing the scaffold.Non-limiting examples of a plurality or library of scaffolds or bindingproteins include those expressed as a phage display, ribosome display,polysome display, or cell surface display as well as those presented asan array or microarray format. In some preferred embodiments, theplurality is expressed as part of the tail fibers of Bordetellabacteriophages.

The resultant plurality or library of scaffolds or binding proteins maybe screened for binding against a target molecule of interest. Theinvention provides a method of selecting for binding comprisingproducing or providing a plurality, or library, of scaffolds or proteinsin a plurality of cells as described above followed by selectingproteins which bind a molecule of interest after individually contactingeach of said plurality of scaffolds or proteins (or phage particles,cells, or media containing them) with a target molecule of interest.Optionally, the binding proteins in the plurality or library are indimeric or other multimeric form. The invention also provides foridentifying a multimeric form of a binding protein as having a greateravidity for the target molecule of interest than a monomeric form of theprotein.

Alternatively, the plurality or library of scaffolds or binding proteinsmay be screened for binding to any one of a multiplicity of targetmolecules as an additional method of the invention. The scaffolds orproteins contacted with multiple molecules followed by selection ofthose scaffolds or proteins that bind at least one of the targetmolecules may be isolated. The multiple target molecules may be in amixture or disposed on an array or microarray as non-limiting examples.Other such examples include multiple molecules in or on a cell or tissueas well as multiple molecules immobilized on a solid support. The targetmolecules are preferably polypeptides, optionally modified byglycosylation, phosphorylation, or other post-translationalmodification; carbohydrates; lipids; or complex combinations thereof.The target molecules may be expressed on the exterior of phage or avirus, or a viable or non-viable cell of any phyla. In some embodimentsof the invention, the plurality or library of scaffold or bindingprotein is expressed on the exterior of phage, such as Bordetellabacteriophage.

Where the members of a plurality or library of scaffolds or bindingproteins are individually expressed on the exterior of individual phageparticles, the invention provides methods of selecting for bindingagainst a target ligand or molecule of interest by use of the pluralityor library of phage particles. The plurality, or library, is providedand contacted with a target ligand or molecule of interest followed byselection of phage which bind the ligand or molecule, optionally byremoval of phage which do not bind. The selected phage particles may bepropagated followed by one or more additional rounds of contacting andselection, optionally under more stringent wash conditions, to “enrich”for phage expressing a scaffold or binding protein with greater affinityor avidity. The polynucleotide encoding the scaffold or binding proteinmay be isolated from the selected phage and analyzed (e.g. sequenced),amplified or propagated to produce the scaffold or binding protein. Incases of a binding protein, the phage may have been expressing theprotein in dimeric, trimeric or other multimeric form. Such selectedphage may be used as sources of genes or gene fragments encoding bindingprotein molecules with the desired specificity and avidity.

The selection methods of the invention may further include an additionaldetermination of the scaffold or binding proteins, selected as describedabove, as binding or not binding to a second molecule. Scaffolds orbinding proteins that bind a second molecule would be identified asnon-specific for the target ligand or molecule of interest, while thosethat do not bind a second molecule would be identified as specific forthe target ligand or molecule of interest relative to the secondmolecule.

The scaffolds and binding proteins of the invention may also bemodified, such as by attachment of another moiety thereto. Non-limitingexamples of a moiety for attachment include a detectable label or atoxin or activatable pro-drug. Modified scaffolds and binding proteinsmay be used to target a cell which is bound thereby. As a non-limitingexample, a detectably labeled modified scaffold or binding protein maybe used to detect a cell expressing a molecule bound by the binding siteof the scaffold or protein. The molecule may be expressed on the cellsurface, such that the scaffold or binding protein binds the exterior ofthe cell. The molecule may also be expressed within the cell, whereinthe scaffold or binding protein binds after introduction into theinterior of the cell, such as, but not limited to, cases where the cellshave been permeabilized. Non-limiting examples of cells that may bedetected include both prokaryotic and eukaryotic cells, includingbacterial cells and higher eukaryotic cells from a multicellularorganism.

A modified scaffold or binding protein attached to a toxin, or pro-drugform thereof, may be used to decrease the viability of, or to kill,cells which express a cell surface molecule bound by the modifiedscaffold or protein. Preferably, the cells are cancer cells, such asthose of a mammal, preferably a human.

In additional aspects of the invention, compositions comprising thescaffolds and binding proteins of the invention are provided. Thecompositions may be used for the practice of the methods disclosedherein, including diagnostic, prophylactic or therapeutic applications.Additionally, compositions comprising the nucleic acid molecules andpolypeptides disclosed herein as well as materials for the expressionthereof are provided. These compositions may be provided in the form ofa kit for the expression and production of the scaffolds and proteins ofthe invention.

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedrawings and detailed description, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the organization of the Bordetella phage DGR containing asingle copy of Mtd with its VR followed by a nearly identical (90%),134-bp direct repeat of the VR called the template repeat (TR), which isinvariant among Mtd variants. The amino acid sequence of VR in each ofthe five Mtd variants is shown in the upper box, together with thepredicted amino acid sequence encoded by the corresponding nucleotidetriplets of the TR in the lower box. The region corresponding to theinitiation of mutagenic homing (IMH) sequence is underlined.

FIG. 2A shows two representations of the intertwined, pyramid-shapedtrimer structure of several Mtd variants.

FIG. 2B shows a representation of an Mtd monomer and three domainstherein: β-prism, intermediate domain containing the β-sandwich, andC-type lectin (CTL)-fold including the VR and the region correspondingto the IMH.

FIG. 2C is a schematic showing regions of secondary structure in Mtd.

FIG. 3A shows a representation of an Mtd CTL-fold.

FIG. 3B shows a representation of 12 variable residues which are almostall solvent-exposed and organized into a receptor-binding site on theexternal face of the Mtd β2β3β4β4′ sheet.

FIG. 3C shows a structural comparison of Mtd-P1, -3c, -M1, -I1, and -N1used to determine that the main chain conformation of the CTL domain isremarkably consistent, despite half of the variable residues being onloop regions.

FIG. 3D shows a representation of Serine-270 (S270) and Glutamate-267(E267) from the second insert in the Mtd CTL-fold forming hydrogen bondsto the invariant VR residues Serine-351 (S351) and Serine-353 (S353),respectively, within the binding region.

FIG. 3E shows that the β2β3 loop from one monomer hydrogen bonds to theinvariant VR residue Arginine-354 (R354) and to main chain (scaffold)atoms of VR.

FIG. 4 shows by means of molecular surface representations that Mtd-P1(BPP-1) and Mtd-I1 (BIP-1) have highly hydrophobic binding sites, andthat the continuity of the hydrophobic surface decreases successivelyfor Mtd-3c (BPP-3), -M1 (BMP-1), and -N1 (BNP). The view is looking ontothe base of pyramid-shaped Mtd, that is, the surface that binds theexposed binding surface of the target molecule. The variable amino acidresidues (except for 348) are numbered on the surface of BPP-1. Thevariable and invariant hydrophobic amino acid residues (Ala, Val, Leu,Ile, Phe, Tyr, Trp, and Met) are in green and yellow, respectively; andvariable and invariant hydrophilic amino acid residues (Ser, Thr, Asn,Gln, Asp, Glu, His, Lys, Arg, and Cys) are in red and pink,respectively. The surface denoted ‘Invariant’ shows, using the samecoloring scheme, the hydrophobic and hydrophilic surface surrounding thevariable portion of the binding sites.

FIG. 5 shows the structure-based sequence alignment of the β2β3β4β4′sheet of the CTL-fold in Mtd-P1 and 12 variable proteins of putativeDGRs, as discussed herein. Residues colored light gray correspond tovariable residues in Mtd, and those residues found to differ between VRand TR in genomic sequences of the other 12 proteins. Residues coloreddark gray are those that could vary by an adenine-directed mechanism inthese other proteins. Magenta corresponds to identical residues andyellow to residues conserved in chemical character. In assigning color,the grays take precedence over magenta and yellow, such that certainputatively variable residues are also identical or conserved. Secondarystructure elements (box for β-strand, and oval for 3₁₀-helix) for Mtdare denoted above the alignment, and the ‘GGXW’ motif is also denoted.The 12 variable proteins of putative DGR's are from Vibrio harveyi MLphage (V.h. ML); Bifidobacterium longum (B.l.); Bacteroidesthetaiotaonicron (B.t.); Treponema denticola (T.d.); Trichodesmiumerythraeum 1A (T.e. 1A); Trichodesmium erythraeum 1B (T.e. 1B);Trichodesmium erythraeum #2 (T.e. 2); Nostoc PPC ssp. 7120 #1 (N. PCC.1); Nostoc PPC ssp. 7120 #2A (N. PCC. 2A); Nostoc PPC ssp. 7120 #2B (N.PCC. 2B); Nostoc punctiforme #1 (N.p. 1); and Nostoc punctiforme #2(N.p. 2).

DETAILED DESCRIPTION OF MODES OF PRACTICING THE INVENTION

This invention is based in part on X-ray crystal structures of four Mtdvariants, each competent to promote infectivity and each having adifferent receptor specificity (Mtd-P1, -3c, -M1, and I1). The structureof a fifth Mtd variant from a non-infective phage (see Mtd-N1 in FIG. 1)was also determined. The 1.5 Å resolution structure of Mtd-P1 wasdetermined by multiwavelength anomalous dispersion usingseleno-methionine substituted protein, and structures of other Mtdvariants were determined by molecular replacement. The overallstructures of these variants are nearly identical, indicating sequencevariation within the VR causes no large conformational shifts.

The Mtd variants are all seen to form an intertwined, pyramid-shapedtrimer (FIG. 2A). The dimensions of the trimer (height and base of ˜90 Åand ˜50 Å, respectively) correspond roughly to the size of knobs seen onthe ends of Bordetella phage tail fibers (see Liu, M. et al. Genomic andgenetic analysis of Bordetella bacteriophages encoding reversetranscriptase-mediated tropism-switching cassettes. J Bacteriol 186,1503-17 (2004)). The extensive trimer interface buries more than 4,500Å² of surface area in each monomer, consistent with an obligatory trimerand with trimeric association observed by static light scattering. Themajority (69%) of the interface area is composed of non-polar residues.Each polypeptide is also joined to its neighbor via 20 hydrogen bonds,one electrostatic interaction (between Glu-234 and Arg-354), and atleast one shared cation (magnesium or calcium at Phe-313 carbonyl).

Mtd is composed of three domains (see FIG. 2B). At the apex of thepyramid, the N-terminal domains (residues 1-48) of each of the threemonomers form a threefold symmetric β-prism, with each monomercontributing a four-stranded, antiparallel β-sheet flanked by a shortα-helix. The β-prism is structurally similar to the pseudo-threefoldsymmetric β-prisms observed in monocot lectins (rmsd 2.4 Å, 60 Cα atoms,see Hester, G., Kaku, H., et al. Structure of mannose-specific snowdrop(Galanthus nivalis) lectin is representative of a new plant lectinfamily. Nat Struct Biol 2, 472-9 (1995)). However, the Mtd β-prism doesnot contain the spatial arrangement of residues required in monocotlectins which bind carbohydrates without a CTL-fold.

The β-prism domain of each Mtd monomer is joined to the followingintermediate domain by a short 3₁₀-helix (residues 49-54), whichintertwines with equivalent 3₁₀-helices from other monomers. Theseconnections cross such that the β-prism domain occupies a different faceof the pyramid than the other domains.

In contrast to the intimate trimeric association of the β-prism domain,the intermediate domain (residues 56-170) splays away from the trimeraxis and makes little contact to other monomers. The intermediate domainis formed by an elaborated β-sandwich containing three- andfour-stranded antiparallel sheets and with the three-stranded sheetmaking a near right-angle turn near its middle (see FIG. 2B). Thestructure of the intermediate domain appears to constitute a novel fold.Without being bound by theory, and offered to advance understanding ofthe invention, the N-terminal β-prism or intermediate β-sandwich domainsare theorized to permit association of the individual monomers with eachother as well as being possibly involved in tethering Mtd to the surfaceof Bordetella phage.

The superscaffold of the proteins of the invention may thus include allor part of one or both of the β-prism and intermediate domains of Mtd,where the Mtd CTL-fold contains one scaffold of the invention. Thesesuperscaffold domains may be used to arrange and display the bindingsite of a scaffold of the invention as described herein.

The Mtd C-terminal domain (residues 171-381), which constitutes morethan half of Mtd and contains the VR, is unexpectedly found to have aC-type lectin (CTL)-fold (see Weis, W. I., et al. Structure of thecalcium-dependent lectin domain from a rat mannose-binding proteindetermined by MAD phasing. Science 254, 1608-15 (1991); Drickamer, K.C-type lectin-like domains. Curr Opin Struct Biol 9, 585-90 (1999); andHolm, L. et al. Protein structure comparison by alignment of distancematrices. J Mol Biol 233, 123-38. (1993)). See FIG. 3A. Althoughoriginally named for calcium-dependent carbohydrate binding in mammalianmannose binding protein (MMBP, see Weis, W. I., et al. Structure of aC-type mannose-binding protein complexed with an oligosaccharide. Nature360, 127-34 (1992)), different individual CTL-folds have been recognizedto bind different ligands.

The similarity of Mtd to carbohydrate-binding CTL proteins, such as MMBP(1.5 Å rmsd, 60 Cα atoms), appears to be the result of convergentevolution. None of the 14 residues absolutely conserved incarbohydrate-binding CTL domains is found in Mtd, and neither are theresidues required for calcium- and carbohydrate-binding. Likewise, noneof the four disulfide-bond forming cysteines found in many CTL domainsis found in Mtd, confirming that disulfides are not required forstability of CTL-folds. Furthermore, Mtd has no obvious amino acidsequence relationship to other convergently evolved CTL domains, such asthe E. coli virulence factor intimin, but does have structuralsimilarity as expected (rmsd 1.8 Å, 75 Cα atoms).

The typical distinguishing features of the ˜110-130 residue CTL-fold, asalso seen in Mtd, are a two-stranded antiparallel β-sheet formed by thedomain's N- and C-termini (β1β5) connected by two β-helices to athree-stranded, antiparallel β-sheet (β2β3β4), see FIG. 3A. Thesefeatures are also generally present in other CTL-folds, which range fromabout 95 to about 150 residues, described herein for use in the practiceof the invention. The β2 strand is uniquely twisted in Mtd such that itcrosses over the β3 strand. Unique to Mtd are inserts (residues 200-236and 264-305) that interrupt connections between β1 and α1 and between α2and β2, respectively, as well as some additional short strands (β0 andβ4′). The inserts have no regular secondary structure but do havespecific conformations due to an extensive hydrogen bonding network,including to residues within the binding site. Without being bound bytheory, and offered to advance the understanding of the presentinvention, it is possible that the inserts stabilize the VR as discussedbelow. As noted above, the Mtd CTL-fold, and other analogous CTL-foldsof similar structural arrangement, may be used as a scaffold in thepractice of the present invention.

The Mtd CTL-fold contains 12 residues that are variable. The 12 variableresidues are almost all solvent-exposed and organized into areceptor-binding site on the external face of the β2β3β4β4′ sheet (FIG.3B). This face is equivalent to the one in the CTL-fold proteins Ly49A(see Tormo, J., et al. Crystal structure of a lectin-like natural killercell receptor bound to its MHC class I ligand. Nature 402, 623-31(1999)) and intimin (Luo, Y. et al. Crystal structure ofenteropathogenic Escherichia coli intimin-receptor complex. Nature 405,1073-7 (2000); and Batchelor, M. et al. Structural basis for recognitionof the translocated intimin receptor (Tir) by intimin fromenteropathogenic Escherichia coli. EMBO J 19, 2452-64 (2000))responsible for interaction with their respective targets, class I MHCmolecules and Tir. Half of the 12 variable residues are located onregular secondary structure elements: three are located on β-strands(357 on β4; 368 and 369 on β4′), and three on a 3₁₀-helix that connectsβ3 to β4 (347, 348, and 350), see FIG. 3B. The other half of thevariable residues occupy loop positions preceding the 3₁₀-helix (344 and346) or connecting β4 to β4′ (359, 360, 364, and 366).

All variable residues, except for 348 and 369, are encoded by AAC codonsin TR. Adenine-directed mutagenesis permits substitution of Asn encodedby AAC with 14 other residues, which cover the gamut of chemicalcharacter. For example, while adenine substitution of AAC cannot producea codon for Trp, it can produce codons for Phe and Tyr. Likewise, whilesubstitution cannot produce codons for Glu and Lys, it can producecodons for Asp and Arg (also His). Significantly, the use of the AACcodon rules out a nonsense codon being introduced. Adenine-substitutionof the two non-AAC codons in TR, ACG encoding Thr-348 and ATC encodingIle-369, can produce three other amino acids (Ser, Pro, Ala at 348; Val,Leu, Phe at 369). There appears to be no structural necessity forresidue 348 to be small, but 369 is preferably hydrophobic to packbetween the invariant residues Trp-307 and Trp-309 (FIG. 3B).

Along with these variable residues, the binding site in Mtd containsfour invariant, solvent-exposed aromatic residues that are likely tocontribute to interactions despite their status as amino acid residuesof a scaffold as described herein. These are Trp-307 and Trp-345 at thecenter and periphery, respectively, of the binding site. Also at theperiphery are the invariant residues Tyr-322 and Tyr-333, which comefrom the intertwining of an adjacent monomer's β2β3 loop into aneighbor's binding site (FIG. 3B). Altogether, the binding siteincluding the variable and above invariant residues in Mtd-P1 presents˜900 Å² of exposed surface area.

In the practice of the invention, it is contemplated that “conservativeamino acid substitutions” may be favored due to the interchangeabilityof residues having similar side chains. Thus amino acids may be groupedbased upon the similarities of their side chains and substituted foreach other on this basis. For example, a group of amino acids havingaliphatic side chains is glycine, alanine, valine, leucine, andisoleucine; a group of amino acids having aliphatic-hydroxyl side chainsis serine and threonine; a group of amino acids having amide-containingside chains is asparagine and glutamine; a group of amino acids havingaromatic side chains is phenylalanine, tyrosine, and tryptophan; a groupof amino acids having basic side chains is lysine, arginine, andhistidine; and a group of amino acids having sulfur-containing sidechains is cysteine and methionine. The invention provides for the“conservative substitution” of one amino acid residue in a group byanother amino acid residue in the same group. Other conservative aminoacid substitution groups include, but are not limited to,valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine,alanine-valine, and asparagine-glutamine.

The final portion of VR, the β5 strand, is encoded by the ‘initiation ofmutagenic homing’ (IMH) sequence, which maintains the unidirectionalflow of mutagenized genetic information from TR to VR. This region of VRis unaffected by adenine-directed mutagenesis and therefore invariant.Invariance at the nucleotide level is echoed at the protein level amongMtd variants, with β5 making close intra- and inter-molecular contactswithin the central core of the trimer that would be potentiallydisrupted by variation. Thus all or part of this IMH-encoded β5 strandof the protein may be part of a superscaffold as described herein whilethe nucleic acid encoding the β5 strand, or a portion thereof, serves asthe IMH, which maintains the unidirectional flow of diversity generatinginformation from TR to VR.

Based in part on the foregoing, the present invention provides a bindingprotein comprising a scaffold for presentation of a binding site withvariable residues as described herein. In a broad sense, the scaffoldsand binding proteins of the invention may be substituted for antibodies,and antigen binding fragments thereof, or other affinity agents indetection or other affinity-based assays or in therapeutics as known inthe art.

In preferred embodiments, the scaffold comprises all or part of a CTLD,the Mtd CTL-fold, or an Mtd-like CTL-fold. In the case of the MtdCTL-fold, the scaffold would permit possible variation at one or more ofthe 12 variable residues described herein. Alternatively, the scaffoldcomprises all or part of another CTL-fold, including those of microbialproteins as described herein (see FIG. 5 and Example 3) as well as thoseof a selectin; MBP; NKG2D; CD69; EMBP; TSG-6; and intimin as describedherein. By “binding site”, it is meant the side chains of variableresidues which define, in whole or in part, the three dimensionalstructure or shape which permits binding of the polypeptide attached tothe side chains (through the alpha carbons of each variable residue) toa target molecule. Thus a scaffold is a polypeptide which functionallypresents the binding site defining variable residues (contained in saidpolypeptide) to interact with a target molecule bound by the bindingsite. Scaffolds of the invention that contain a binding site that isfunctionally presented to bind a target molecule are thus analogous to aFv region of an antibody molecule and so may be used in analogous ways.As a non-limiting example, a scaffold of the invention may be conjugatedto another molecule as described herein, such as to form a fusionprotein or to form a labeled scaffold. The scaffolds of the inventionmay also be viewed as comprising a variable region which contains abinding site of the invention.

The relationship between a binding site, and thus a scaffold or bindingprotein of the invention, and a “target molecule” as used herein mayalso be described as the relationship between the members of a bindingpair, wherein one member of the pair has an area on its surface or in aportion thereof which binds to the other member of the pair. Therelationship may also be described as that between members of a specificbinding pair, wherein one member of the pair has an area on its surfaceor in a portion thereof which specifically binds to the other member ofthe pair. The members of a pair may be referred to as ligand andanti-ligand (or ligand and receptor), either of which may be thescaffold or binding protein of the invention. The members of a pair areexemplified by other known, and non-limiting examples, includingantibody and antigen or hapten; biotin and avidin (or streptavidin);hormone and hormone receptor; immunoglobulin and protein A; andphosphorylated serine residues and annexin. Thus a scaffold or bindingprotein of the invention may be viewed as a receptor that binds a ligandas the molecule of interest, or as a ligand that is bound by a receptoras the molecule of interest.

Preferably, a scaffold of the invention is at least about 40 amino acidresidues. The scaffold may also be about 45, about 50, about 55, about60, about 65, about 70, about 75, about 80, about 85, about 90, about100, about 110, about 120, about 130, about 140, about 150, about 160,about 170, about 180, about 190, about 200, about 220, or about 230 ormore amino acid residues.

The scaffold in a binding protein of the invention is also preferably inthe C-terminal half of the protein. More preferred is where the scaffoldis within about 100, about 75, about 50, about 40, about 30, about 20,or about 10 amino acid residues of the C-terminus of the protein.

Scaffolds containing a binding site may also be conjugated to asuperscaffold as described herein to form a binding protein of theinvention. A superscaffold of the invention of course does not interferewith the presentation of the binding site by the scaffold, although asexplained herein, the superscaffold can serve to permit multimerizationof scaffolds, and thus multimerization of binding sites in order toeffect high avidity of the binding site comprised of multiple identicalor non-identical lower affinity binding sites. Alternatively, thesuperscaffold can serve as a means, or a linker, to permit conjugationof another molecule to the scaffold and thus binding site through thestructure of the superscaffold.

The amino acid sequences that form the superscaffold are preferablythose of non-CTL-fold regions naturally occurring in association with aCTL-fold. One non-limiting example is residues 1-170 of Mtd (SEQ IDNO:20). Other non-limiting examples include the oligomerization domainsdescribed by Drickamer (Ibid), including α-helical domains ofmannose-binding protein (MBP), which domains form trimeric coiled coils;the β strand from the N terminus of the MBP CRD, optionally with theC-terminal β strand of the CRD and the C-terminal end of helix α2, whichdimerize MBP when the α-helical coiled coil domain is absent; theN-terminal β strands of the Polyandrocarpa lectin, optionally with helixα2; loops from factors IX and X which permit the formation of a “head tohead” interaction between two CTLDs with optional stabilization by aninterchain disulfide bond. Of course the resultant multimers may behomomultimers, composed of scaffolds with the same binding activity, orheteromultimers, composed of scaffolds with more than one bindingactivity. Thus the invention provides for homodimers, heterodimers,homotrimers, heterotrimers, as well has higher orders of homomeric andheteromeric proteins. Further non-limiting examples include thetransmembrane and domains D0, D1, and/or D2 of EPEC intimin as well asthe four Ig-like domains (D1-D4) of Y. pseudotuberculosis invasin.

The binding proteins of the invention are thus made up of at least ascaffold containing a binding site as described herein. This combinationmay be non-naturally occurring in the sense that the binding site may bepart of a variable region derived from a first CTL-fold that is insertedinto the corresponding region of a second, and different, CTL-fold.Thus, as a non-limiting example, the Mtd based binding site may beinserted in place of the corresponding region between the β3 and β5strands of another CTL-fold as described herein. The binding proteins ofthe invention may thus be considered “recombinant”. Additional“recombinant” binding proteins include those comprising a superscaffoldattached to the scaffold wherein the superscaffold is not derived fromthe same protein as the scaffold. The polypeptide sequence of thesuperscaffold is preferably that attached to a CTL-fold containingprotein described herein. Further “recombinant” binding proteins includethe multimeric forms of a superscaffold containing binding proteinwherein the subunits of the multimeric form may be the same (to resultin a homomultimer) or different (to result in a heteromultimer).

Preferably, a scaffold or binding protein of the invention is not anisolated form of a naturally occurring polypeptide, where isolatedrefers to a state of being substantially removed from, preferablyentirely removed from, other polypeptides or biomolecules that arenormally found with a naturally occurring polypeptide. A naturallyoccurring polypeptide is one produced by a living organism in theabsence of manipulation or modification by human intervention.Non-limiting examples of human intervention include recombinant DNAmethodology, mutagenesis by chemical or physical means, inhibition ofDNA repair, or manipulation of genetics. Stated differently, the bindingproteins of the invention are preferably recombinant proteins orotherwise the result of human intervention. Thus a scaffold or bindingprotein produced by the recombinant methods described herein, is not anaturally occurring polypeptide.

The term “recombinant” refers to the alteration of a native nucleicacid, or protein or modification by the introduction of a heterologousnucleic acid or protein, via human intervention. The term may refer to acell derived from a cell so modified. As a non-limiting example,recombinant cells express genes that are not found within the native(nonrecombinant) form of the cell or express native genes in anunnaturally overexpressed, under-expressed, or not expressed state.

Preferred embodiments of the invention thus do not include naturallyoccurring Mtd proteins, such as those with SEQ ID NO:20 (Mtd-P1 orBordetella phage BPP-1) or variations thereof having the amino acidsequences of Mtd-P3c, Mtd-M1, Mtd-I1, or Mtd-U1. Naturally occurringselectins; MBPs; NKG2D; CD69; EMBP; TSG-6; and intimin as well asnaturally occurring sequences of CTL-fold containing proteins fromVibrio harveyi ML phage (V.h. ML); Bifidobacterium longum (B.l.);Bacteroides thetaiotaonicron (B.t.); Treponema denticola (Td.);Trichodesmium erythraeum 1A (T.e. 1A); Trichodesmium erythraeum 1B (T.e.1B); Trichodesmium erythraeum #2 (T.e. 2); Nostoc PPC ssp. 7120 #1 (N.PCC. 1); Nostoc PPC ssp. 7120 #2A (N. PCC. 2A); Nostoc PPC ssp. 7120 #2B(N. PCC. 2B); Nostoc punctiforme #1 (N.p. 1); and Nostoc punctiforme #2(N.p. 2) having the corresponding sequences shown in FIG. 5 are alsopreferably not part of the present invention. These proteins are,however, disclosed as providing variable regions between the β3 and β5strands of the CTL-fold contained therein for use in the presentation ofa binding site as described herein. These proteins are also disclosed asproviding CTL-folds for use with the binding sites and variable regionsas described herein.

The invention also provides polynucleotides encoding the scaffolds andbinding proteins described herein. The polynucleotides are preferablyoperably linked to a regulatory nucleic acid sequence that controls orregulates the expression of the coding polynucleotide in a cell or cellextract. A regulatory sequence refers to regions or sequence locatedupstream and/or downstream from the start of transcription that areinvolved in recognition and binding of RNA polymerase and other proteinsto initiate transcription. The term includes a promoter for regulatingstart of transcription.

The polynucleotide may be part of a vector or plasmid used to propagateor amplify the polynucleotide. Where the polynucleotide is operablylinked to a regulatory nucleic acid sequence, presence in a vector orplasmid permits the expression of the encoded scaffold or bindingprotein. This permits production and isolation of large quantities of ascaffold or binding protein of the invention.

Alternatively, the polynucleotide and regulatory sequence is operablylinked to other sequences to form a diversity-generating retroelement(DGR) as described herein such that the variable residues of the bindingsite in the scaffold or binding protein may be readily diversified via aDGR. While embodiments of the invention based upon the nucleic acidsencoding the sequences shown in FIG. 5 are readily used to diversify thebinding sites contained therein, this aspect of the invention isadvantageously applied to other CTL-folds and the binding sitescontained therein where the region between the β3 and β5 strands are nota variable region until operably linked to a TR (and an IMH ifnecessary), as well as any other necessary components in cis or intrans, like reverse transcriptase activity as a non-limiting example,wherein the TR directs alterations of amino acid residues of the bindingsite, and thus variable region, as described herein. Of course thismeans to create alterations in the binding site is limited by adeninedirected mutagenesis as described herein. But the invention alsocontemplates the use of traditional mutagenesis techniques for alteringthe binding specificity of the region between the β3 and β5 strands of aCTL-fold as described herein.

The polynucleotide, preferably as part of a DGR, may also be part of aphage or bacterial genome and expressed on the surface of phage orbacteria. DGR as used herein includes the use of mutagenic homingwherein an IMH directs mutagenesis of variable residues within thevariable region (VR) of a scaffold or binding protein of the inventionthough a functionally linked TR, which directs alterations of nucleotideresidues in the VR based on the locations of adenine residues atcorresponding positions in the related TR sequence, as well as any othernecessary components in cis or in trans, like reverse transcriptaseactivity as a non-limiting example. Use of a DGR advantageously permitsuse of the phage or bacteria to form a library expressing aheterogeneous population of encoded scaffolds or binding proteins on thesurfaces of individual organisms. The use of “population” refers to aplurality of heterogeneous members which have similarities but at leasttwo of which have different binding sites as described herein.

A population of diversified population of phage may be used in a methodto identify a scaffold or binding protein as binding to a targetmolecule of interest. Non-limiting examples of such target moleculesinclude a cell surface molecule, optionally of a cancer cell, anepithelial cell, an endothelial cell, and a bacterial or fungal cellsurface molecule. In some embodiments of the invention, the scaffold orbinding protein is expressed as part of the tail fiber in abacteriophage particle.

Such a method may comprise expressing a population of scaffolds orbinding proteins on the surfaces of members of a library of phageparticles (including as part of the tail fiber(s), of bacteria or ofother cells; contacting the members of the library with a targetmolecule of interest, optionally immobilized; removing members that donot bind to the target; and selecting the library member(s) that bindthe target molecule of interest. Alternatively, the selected members canbe propagated to form another library of members for an additional roundof screening or selection using the above method. This permits theenrichment of library member(s) that bind the target of interest andalso provides a means to verify the selected member(s) as binding thetarget. In some embodiments of the invention, the method furthercomprises isolating polynucleotides from the selected member(s). Thephage library members are one form of a plurality, or family, ofscaffolds or binding proteins of the invention.

A selected or identified scaffold or binding protein may also be“evolved” by a variation of the above to select for enhanced binding tothe same ligand or binding to a different ligand. One method forevolving a previously identified or selected scaffold or binding proteinis to provide a polynucleotide encoding the scaffold or binding protein,allow it to undergo diversification as described herein to produce alibrary of variants; and select for a member of the library withenhanced binding to the same target molecule or with “gain of function”binding to another target molecule.

Of course chemically or genetically known target molecules or unknowntarget molecules may be used to select or identify a scaffold or bindingprotein of the invention. Prior information regarding a targetmolecule's structure is not required to isolate a scaffold or bindingprotein that binds it. Preferably, the scaffold or binding protein willdisplay specific binding affinity for a particular target, optionallywith the functionality of blocking the binding of one or more othermolecules to the target molecule. In the case of a cell surface ligand,the scaffold or binding protein may also be able to stimulate or inhibita metabolic pathway, to act as a signal or messenger, or to stimulate orinhibit cellular activity. A scaffold or binding protein can thus beused as an antagonist, an agonist, as well as a modulator of a cellsurface ligand function. A scaffold or binding protein for an “orphan”receptor to which no natural ligand is known may also be generated.

Unless otherwise defined herein, the use of “specifically binds” or“selectively binds” with respect to a scaffold or binding protein hereinrefers to binding interactions between the scaffold or binding proteinand a first molecular entity that occurs to the exclusion ofinteractions with a second molecular entity present with the first in aheterogeneous population of molecules or other biological materials.Generally, a scaffold or binding protein of the invention binds to atarget molecule better by at least about 2×, more preferably about 5× orabout 10×, than binding to background molecules that are present or usedas non-specific control targets.

The scaffolds and binding proteins of the invention may also bemodified, such as by attachment of another moiety thereto. In someembodiments of the invention, the moiety may be a label, optionally adetectable label, including a directly detectable label such as aradioactive isotope, a fluorescent label (Cy3 and Cy5 as non-limitingexamples) or a particulate label. Non-limiting examples of particulatelabels include latex particles and colloidal gold particles.Alternatively, the label may be for indirect detection. Non-limitingexamples include an enzyme, such as, but not limited to, luciferase,alkaline phosphatase, and horse radish peroxidase. Other non-limitingexamples include a molecule bound by another molecule, such as, but notlimited to, biotin, the Fc portion of an antibody, an affinity peptide,or a purification tag. Preferably, the label is covalently attached. Thescaffold or binding protein may also be selected to bind antibodies fromspecific animals, e.g., goat, rabbit, mouse, etc., for use as asecondary reagent in assays using such antibodies as the primarydetection agent.

Alternatively, a scaffold or binding protein of the invention may bedetected directly by use of a reagent that binds thereto. Non-limitingexamples include an antibody, or functional fragment thereof, that bindsa portion of the scaffold without interference of the binding site orthat binds a portion of the superscaffold without interfering with thebinding site. Such an antibody or fragment thereof is preferably labeledfor detection as described herein and as known in the art.Alternatively, a ligand for a portion of the scaffold or thesuperscaffold, which binds to a region distinct from, and withoutinterference to, the binding site may be used. The ligand is alsopreferably labeled for detection as provided herein and known in theart.

Detection of a scaffold or binding protein of the invention may beadvantageously used to detect the presence of a target molecule bound bythe scaffold or binding protein. Such detection may also be used todetect the presence of a cell that expresses the ligand or molecule.Non-limiting detection assays in which the invention may be adaptedinclude flow cytometry and fluorescent microscopy.

As an alternative non-limiting example, a labeled scaffold or bindingprotein of the invention which specifically binds human chorionicgonadotropin (hCG), to the exclusion of other factors that are normallyfound therewith, may be used to detect hCG in human urine samples as anindicator of pregnancy, such as by use of a lateral flow device as knownin the art. Alternatively, a labeled scaffold or binding protein of theinvention may be used to detect a microorganism, such as pathogenicbacteria or fungi by binding to a cell surface molecule specific to themicroorganism of interest, relative to other organisms normally foundtherewith.

Thus the invention also provides a method of detecting a cell, themethod comprising contacting a scaffold or binding protein of theinvention which binds a cell surface molecule specific to the cell andsubsequently detecting the bound scaffold or binding protein.Preferably, the cell is a bacterial or fungal cell, particularlypathogenic forms thereof. Alternatively, the cell may be associated witha disease or other unwanted condition, including, but not limited to acancer cell or a virally infected cell.

Therefore, the invention provides for the use of a scaffold or bindingprotein as disclosed herein as a diagnostic agent, either in vitro or invivo, based on its ability to bind to a tissue or disease associatedtarget molecule. Tissue associated molecules are those that areexpressed exclusively, or at a significantly higher level, in one ormore tissue(s) compared to other tissues in an animal. Diseaseassociated molecules are those that are expressed exclusively, or at asignificantly higher level, in one or more diseased cells, diseasedtissues, or bodily fluid in comparison to non-diseased cells, tissues,or fluids in an organism.

Non-limiting tissue or disease associated molecules are discussed inTables I and II of U.S. Patent Publication No 2002/0107215. Non-limitingexamples of tissues where target ligands bound by the scaffolds andbinding proteins of the invention include liver, pancreas, adrenalgland, thyroid, salivary gland, pituitary gland, brain, spinal cord,lung, heart, breast, skeletal muscle, bone marrow, thymus, spleen, lymphnode, colorectal, stomach, ovarian, small intestine, uterus, placenta,prostate, testis, colon, colon, gastric, bladder, trachea, kidney, andadipose tissue. Other non-limiting examples include tumor cells, tumortissue sample, organ cells, blood cells, and cells of the skin, lung,heart, muscle, brain, mucosae, liver, intestine, spleen, stomach,lymphatic system, cervix, vagina, prostate, mouth, and tongue.

Non-limiting examples of diseases include, but are not limited to, anautoimmune/inflammatory disorder such as acquired immunodeficiencysyndrome (AIDS), Addison's disease, adult respiratory distress syndrome,allergies, ankylosing spondylitis, amyloidosis, anemia, asthma,atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis,autoimmune polyendocrinopathycandidiasis-ectodermal dystrophy (APECED),bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopicdermatitis, dermatomyositis, diabetes mellitus, emphysema, episodiclymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythemanodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome,gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia,irritable bowel syndrome, multiple sclerosis, myasthenia gravis,myocardial or pericardial inflammation, osteoarthritis, osteoporosis,pancreatitis, polymyositis, psoriasis, Reiter's syndrome, rheumatoidarthritis, scleroderma, Sjogren's syndrome, systemic anaphylaxis,systemic lupus erythematosus, systemic sclerosis, thrombocytopenicpurpura, ulcerative colitis, uveitis, Werner syndrome, complications ofcancer, hemodialysis, and extracorporeal circulation, viral, bacterial,fungal, parasitic, protozoal, and helminthic infections, and trauma; acell proliferative disorder such as actinic keratosis, arteriosclerosis,atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissuedisease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria,polycythemia vera, psoriasis, primary thrombocythemia; cancers includingadenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma,teratocarcinorna, and, in particular, a cancer of the adrenal gland,bladder, bone, bone marrow, brain, breast, cervix, gall bladder,ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle,ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin,spleen, testis, thymus, thyroid, and uterus; a neurological disordersuch as epilepsy, ischemic cerebrovascular disease, stroke, cerebralneoplasms, Alzheimer's disease, Pick's disease, Huntington's disease,dementia, Parkinson's disease and other extrapyramidal disorders,amyotrophic lateral sclerosis and other motor neuron disorders,progressive neural muscular atrophy, retinitis pigmentosa, hereditaryataxias, multiple sclerosis and other demyelinating diseases, bacterialand viral meningitis, brain abscess, subdural empyema, epidural abscess,suppurative intracranial thrombophlebitis, myelitis and radiculitis,viral central nervous system disease, prion diseases including kuru,Creutzfeldt-Jakob disease, and GerstmannStraussler-Scheinker syndrome,fatal familial insomnia, nutritional and metabolic diseases of thenervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinalhemangioblastomatosis, encephalotrigeminal syndrome, mental retardationand other developmental disorders of the central nervous systemincluding Down syndrome, cerebral palsy, neuroskeletal disorders,autonomic nervous system disorders, cranial nerve disorders, spinal corddiseases, muscular dystrophy and other neuromuscular disorders,peripheral nervous system disorders, dermatomyositis and polymyositis,inherited, metabolic, endocrine, and toxic myopathies, myastheniagravis, periodic paralysis, mental disorders including mood, anxiety,and schizophrenic disorders, seasonal affective disorder (SAD),akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia,dystonias, paranoid psychoses, postherpetic neuralgia, Tourette'sdisorder, progressive supranuclear palsy, corticobasal degeneration, andfamilial frontotemporal dementia; a developmental disorder such as renaltubular acidosis, anemia, Cushing's syndrome, achondroplastic dwarfism,Duchenne and Becker muscular dystrophy, epilepsy, gonadal dysgenesis,WAGR syndrome (Wilms' tumor, aniridia, genitourinary abnormalities, andmental retardation), Smith-Magenis syndrome, myelodysplastic syndrome,hereditary mucoepithelial dysplasia, hereditary keratodermas, hereditaryneuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis,hypothyroidism, hydrocephalus, seizure disorders such as Syndenham'schorea and cerebral palsy, spina bifida, anencephaly,craniorachischisis, congenital glaucoma, cataract, and sensorineuralhearing loss. Exemplary disease or conditions include, e.g., MS, SLE,ITP, IDDM, MG, CLL, CD, RA, Factor VIII Hemophilia, transplantation,arteriosclerosis, Sjogren's Syndrome, Kawasaki Disease, AHA, ulcerativecolitis, multiple myeloma, Glomerulonephritis, seasonal allergies, andIgA Nephropathy; and a cardiovascular disorder such as congestive heartfailure, ischemic heart disease, angina pectoris, myocardial infarction,hypertensive heart disease, degenerative valvular heart disease,calcific aortic valve stenosis, congenitally bicuspid aortic valve,mitral annular calcification, mitral valve prolapse, rheumatic fever andrheumatic heart disease, infective endocarditis, nonbacterial thromboticendocarditis, endocarditis of systemic lupus erythematosus, carcinoidheart disease, cardiomyopathy, myocarditis, pericarditis, neoplasticheart disease, congenital heart disease, complications of cardiactransplantation, arteriovenous fistula, atherosclerosis, hypertension,vasculitis, Raynaud's disease, aneurysms, arterial dissections, varicoseveins, thrombophlebitis and phlebothrombosis, vascular tumors, andcomplications of thrombolysis, balloon angioplasty, vascularreplacement, and coronary artery bypass graft surgery.

In other embodiments of the invention, a scaffold or binding protein isconjugated, optionally through a linker, to a toxin, pro-drug, or othermolecule (e.g., a protein, nucleic acid, organic small molecule, etc.)suitable for use as a pharmaceutical or therapeutic agent. Non-limitingexamples of proteins include cytokines, chemokines, growth factors,interleukins, cell-surface proteins, extracellular domains, cell surfacereceptors, and cytotoxins. The conjugated scaffold or binding proteindelivers the attached molecule to a location bound by the binding siteof the scaffold or binding protein. Such forms of the invention may beused in method of decreasing the viability of a cell, preferably adisease associated cell, such as a cancer cell or virally infected cell.Stated differently, the invention provides a method of targeting a cellexpressing a cell surface molecule by use of a scaffold or bindingprotein of the invention. Such a method comprises contacting said cellwith a scaffold or binding protein of the invention which binds saidcell surface molecule.

In the case of a cancer cell, such as those of the cancers listed above,the scaffold or binding protein is one which preferably binds anexternal cell surface molecule of the cell with sufficient specificityto minimize undesirable binding to non-cancer cells. Similarly, in thecase of a virally infected cell, the scaffold or binding protein is onewhich preferably binds a viral antigen expressed on the external cellsurface of an infected cell with sufficient specificity to minimizeundesirable binding to non-infected cells.

Thus the invention also provides a method of decreasing the viability ofa cell, said method comprising covalently linking a cellular toxin orpro-drug to a scaffold or binding protein of the invention andcontacting the linked scaffold or binding protein with a cell comprisinga cell surface molecule bound by the scaffold or binding protein todecrease the viability of the cell. Preferably, the cell is a cancercell, expressing a cell surface marker specific to the cancer cell asdescribed above. Alternatively, the cell is a virally infected cell,expressing a viral antigen, on the cell surface, that is specific tovirally infected cells as described above.

Alternatively, the invention provides for the selection of a scaffold orbinding protein which binds a cell surface molecule such that thebinding of one or multiple scaffolds or binding proteins to the cellthrough the molecule triggers, or is sufficient to activate, a celldeath program in the bound cell. A non-limiting example of such ascaffold or binding protein is one that is analogous to Fas ligand or anantibody against Fas which triggers apoptosis of a cell upon binding toFas expressed on the cell.

Therefore, the invention provides for the use of a scaffold or bindingprotein as disclosed herein as a therapeutic agent for use in thetreatment of disease or other unwanted conditions. Alternatively, ascaffold or binding protein may be used in the prophylactic treatment ofa disease or unwanted condition. The treatments of the invention includeboth in vivo or ex vivo administration. Preferably, the scaffold orbinding protein is formulated as a composition comprising apharmaceutically acceptable excipient, optionally for delayed release(or slow release over time). Sterile formulations of a scaffold orbinding protein are also contemplated.

With respect to in vivo embodiments, a scaffold or binding protein istypically administered or transferred directly to the cells to betreated or to the tissue site of interest via intramuscular,intradermal, subdermal, subcutaneous, oral, intraperitoneal,intrathecal, or intravenous procedures. Alternatively, a scaffold orbinding protein can be placed within a cavity of the body, such asduring surgery, or by inhalation, or vaginal or rectal administration.With respect to ex vivo embodiments, the contacted cells are returned ordelivered to the site from which they were obtained or to another sitein the subject to be treated. The subject need not be that from whichthe cells were obtained. The treated cells may be optionally graftedonto a tissue or organ before being returned or alternatively deliveredto the blood or lymph system using standard delivery or transfusiontechniques.

Subjects that may be treated with a scaffold or binding protein of theinvention include, but are not limited to, a mammal, including a human,primate, dog, cat, mouse, pig, cow, goat, rabbit, rat, guinea pig,hamster, horse, sheep; or a non-mammalian vertebrate such as a bird(e.g., a chicken or duck), or fish; or an invertebrate.

The invention also provides for compositions comprising a scaffold orbinding protein disclosed herein. Non-limiting examples includeattachment of a scaffold or binding protein to a surface, such as thatof a tube, well, or dish; attachment to a matrix of an affinitymaterial; or attachment to beads, a column, a solid support, or amicroarray

The compositions and methods of the present invention are ideally suitedfor preparation of kits produced in accordance with well knownprocedures. The invention thus provides kits comprising agents (like ascaffold or binding protein, or a library of scaffolds or bindingproteins, described herein as non-limiting examples) for use in one ormore methods as disclosed herein. Such kits, optionally comprising anagent with an identifying description or label or instructions relatingto their use in the methods of the present invention, are provided. Sucha kit may comprise containers, each with one or more of the variousreagents (typically in concentrated form) or devices utilized in themethods. A set of instructions will also typically be included.Standards for calibrating the binding of a scaffold or binding proteinto a ligand may also be included in the kits of the invention.

Alternatively a kit of the invention may comprise one or more reagentsfor production of a library of scaffolds or binding proteins, such asthat embodied in phage particles which express individual members of thelibrary. Such kits may contain vectors, such as initial phage particles,and cells for their propagation and plating as well as expression ofscaffolds or binding proteins.

Having now generally described the invention, the same will be morereadily understood through reference to the following examples which areprovided by way of illustration, and are not intended to be limiting ofthe present invention, unless specified.

EXAMPLES

The following examples are offered to illustrate, but not to limit theclaimed invention.

Example 1 Crystal Structures of Mtd Variants

Structural comparison of Mtd-P1, -3c, -M1, -I1, and -N1 were used todiscover that the main chain conformation of the CTL domain isremarkably invariant, despite half of the variable residues being onloop regions (FIG. 3C). The binding site in these variants is highlywell ordered, having average main chain B-factors ranging from ˜9 Å² inMtd-P1 to 24 Å² in Mtd-M1 and with density visible for all but one sidechain (Phe-346 in Mtd-I1). Providing stabilization to these loops in Mtdare two features unique to the Mtd CTL-fold, namely the two inserts andtrimeric assembly.

The inserts form hydrogen bonds to VR, including three to side chains ofthree invariant serines in VR. Ser-270 and Glu-267 from the secondinsert form hydrogen bonds to the invariant VR residues Ser-351 andSer-353, respectively (FIG. 3D), and main chain atoms of the firstinsert form hydrogen bonds to invariant VR residue Ser-365 (notdepicted). These interactions are supplemented by hydrogen bonds betweenthe inserts and main chain (scaffold) atoms of the VR. Likewise,trimeric assembly contributes to stabilizing VR, specifically throughcontacts from a neighboring monomer's extensive β2β3 loop. The β2β3 loopfrom one monomer contributes not only the aforementioned invarianttyrosines (322 and 333) to a neighbor's binding site (FIG. 3B), but alsohydrogen bonds to the invariant VR residue Arg-354 and to main chain(scaffold) atoms of VR (FIG. 3E). The β2β3 loop has the sameintertwining conformation in all Mtd variants examined, being positionedover invariant residues (i.e., 351-356) in a neighbor's binding site.

The binding sites of the five Mtd variants studied differ greatly intheir pattern of hydrophobicities. FIG. 4A shows that Mtd-P1 and Mtd-I1have highly hydrophobic binding sites, and that the continuity of thehydrophobic surface decreases successively for Mtd-3c, -M1, and -N1,with this last one having nine TR-encoded, mostly hydrophilic residues(FIG. 1). The binding sites of Mtd-P1 and -I1 accommodate four to fivelarge, exposed hydrophobic residues, and although a preponderance ofexposed hydrophobic surface is correlated with protein instability, bothMtd-P1 and -I1 are found to be highly stable proteins. The invariantarea surrounding the binding site is largely hydrophilic, most likelyaiding protein stability.

Example 2 Basis of Mtd to Ligand Interactions

To understand the basis of Mtd interactions with its ligand, a cellsurface receptor, we characterized association between Mtd-P1 and theBordetella receptor pertactin. The pertactin ectodomain (Prn-E) wasincubated with Mtd variants and found by a coprecipitation assay toassociate most strongly with Mtd-P1 but also with Mtd-3c and Mtd-M1. Asa measure of specificity, Prn-E was not found to associate with Mtd-I1or Mtd-N1. The three Mtd variants that are found to bind pertactin havein common the variable residue Tyr-359, previously shown by sequencecomparison to be a consistent determinant for pertactin interaction. Thepresence of a tyrosine residue in the binding pocket is consistent withthe presence of a number of hydrophobic surface-exposed patches on Prn-E(see Emsley, P., et al. Structure of Bordetella pertussis virulencefactor P.69 pertactin. Nature 381, 90-2 (1996)). The maintenance of Prnaffinity in some of these Mtd variants agrees with the relatively highfrequency with which the phage adopts the BPP phenotype.

Despite each monomer providing a discrete binding site, thestoichiometry of association between Mtd and Prn-E is 3: 1, as assessedby static light scattering. This may reflect steric occlusion of emptybinding sites by elongated pertactin or pseudo-symmetric binding. Theaffinity of Mtd for Prn-E has a K_(D) of ˜3 μM as measured by isothermaltitration calorimetry (ITC). Because Bordetella phage has six tailfibers with each fiber appearing to have two Mtd trimers, the affinityis likely translate to high avidity during infection. The ITC experimentalso demonstrated that the endothermic interaction between the twomolecules is entropically driven, as would be expected from thehydrophobic binding site of Mtd-P1. The affinity of Mtd-M1 for Prn-E istoo low to be reliably measured by ITC, but a K_(D) of ≧200 μM isestimated, suggesting that the boundary between a productive andnonproductive interaction lies between 3 and ≧200 μM.

Example 3 CTL-Fold in Other DGRS

A number of other putative DGRs have been identified in phage andbacterial genomes. These resemble the Bordetella phage DGR in havingsequence-related reverse transcriptases, similar arrangements of VR andTR, adenines constituting the main differences between VR and TR, andIMH-like elements at the end of VR. However, the putative variableproteins have no obvious sequence relationship to Mtd or other proteins.Because there appears to be no genetic requirement for VR and its IMHelement to be positioned at the very C-terminus of a protein, thevariations in positioning likely reflects the necessities of proteinbinding requirements as specified by the CTL-fold. Despite the lowsequence identity among these proteins (˜17%), we have been able to usethe structure of Mtd along with considerations about variability toconstruct a sequence alignment consisting of the β2β3β4β4′ sheet of theCTL-fold (see FIG. 5). Most notably, the invariant Mtd binding siteresidue Trp-345 is seen to be present in a highly conserved ‘GGXW’motif. Invariant residues (Ser-351, Ser-353, Arg-354) involved in loopstabilization, trimeric contacts, or both are also generally conserved.As in Mtd, residues differing between VR and TR or ones that couldpotentially vary through an adenine-directed mechanism in these proteinsare located chiefly between the β3 and β4′ strands. These conclusionsare bolstered by profile-based sequence alignment, which providesstatistical confidence for the putative variable proteins from suchdiverse organisms as Treponema denticola, Vibrio harveyi ML phage, andthe various cyanobacteria being related to Mtd and consequently having aCTL-fold.

All references cited herein are hereby incorporated by reference intheir entireties, whether previously specifically incorporated or not.As used herein, the terms “a”, “an”, and “any” are each intended toinclude both the singular and plural forms.

Having now fully described this invention, it will be appreciated bythose skilled in the art that the same can be performed within a widerange of equivalent parameters, concentrations, and conditions withoutdeparting from the spirit and scope of the invention and without undueexperimentation. While this invention has been described in connectionwith specific embodiments thereof, it will be understood that it iscapable of further modifications. This application is intended to coverany variations, uses, or adaptations of the invention following, ingeneral, the principles of the invention and including such departuresfrom the present disclosure as come within known or customary practicewithin the art to which the invention pertains and as may be applied tothe essential features hereinbefore set forth.

1. A non-naturally occurring protein with a variable binding site, saidprotein comprising a C-type lectin fold (CTL-fold) or part thereof; andthe variable binding site which is located in the CTL-fold within theregion between the β3 and β5 strands; wherein the variable binding siteis displayed and oriented by the CTL-fold; and wherein the protein isnot a naturally occurring Bordetella-related Major Tropism Determinant(MTD).
 2. (canceled)
 3. The protein of claim 1 wherein said CTL-fold isa tropism determinant and wherein the protein is expressed on theexterior of a phage or bacterium.
 4. (canceled)
 5. (canceled)
 6. Theprotein of claim 1 wherein said variable binding site comprises (SEQ IDNO:23) -A-A-L-F-G-G-X-W-X-X-T-S-X-S-G-S-R-A-A-X-W-X-X-G-P-S-X-S-X-A-X-X-; (SEQ ID NO:24)-X-W-X-X-T-S-X-S-G-S-R-A-A-X-W-X-X-G-P-S-X-S-X-A- X-X-G-A-R-G-V-C-; (SEQID NO:25) -A-A-L-F-G-G-X-W-X-X-T-S-X-S-G-S-R-A-A-X-W-X-X-G-P-S-X-S-X-A-X-X-G-A-R-G-V-C-; or (SEQ ID NO:26)-X-W-X-X-T-S-X-S-G-S-R-A-A-X-W-X-X-G-P-S-X-S-X-A-X-X-G-A-R-G-V-C-D-H-L-I-L-E.


7. The protein of claim 6 wherein said variable binding site is about44-45 amino acid residues in length.
 8. (canceled)
 9. (canceled) 10.(canceled)
 11. (canceled)
 12. (canceled)
 13. (canceled)
 14. The proteinof claim 1, further comprising a label attached to said protein.
 15. Theprotein of claim 14 wherein said label is a covalently attached,directly detectable label.
 16. The protein of claim 1, furthercomprising a cellular toxin or pro-drug attached to said protein.
 17. Amethod of decreasing the viability of a mammalian cell, said methodcomprising covalently linking a cellular toxin or pro-drug to theprotein of claim 1; and contacting said linked protein with a mammaliancell comprising a cell surface molecule which binds said protein todecrease the viability of said cell.
 18. The method of claim 17 whereinsaid mammalian cell is a cancer cell in a human subject or other mammal.19. A method of detecting a bacterial cell, said method comprisingcontacting the protein of claim 1 with a bacterial cell comprising acell surface molecule which binds said protein; and detecting saidprotein on said bacterial cell.
 20. A method of targeting a cellexpressing a cell surface molecule, said method comprising contactingsaid cell with a protein according to claim 1 which binds said cellsurface molecule.
 21. (canceled)
 22. (canceled)
 23. (canceled) 24.(canceled)
 25. The protein of claim 1 wherein the variable binding siteis selected from the group consisting of SEQ ID NOs:1, 4, 5, 12, 14, 16,18, 21, and
 22. 26. The protein of claim 1 wherein the variable bindingsite is selected from the group consisting of SEQ ID NOs:34-45.
 27. Theprotein of claim 1 wherein the variable binding site is attached to allor a part of SEQ ID NOs:2, 3, 6-11, 13, 15, 17 or
 19. 28. The protein ofclaim 1 wherein the CTL-fold is all or part of SEQ ID NO:20.
 29. Theprotein of claim 1 wherein the sequence variation within the variablebinding site causes no conformational shifts in the CTL-fold greaterthan 1.8 Å.
 30. The protein of claim 1 wherein the variable binding siteis derived from a cyanobacterium, Treponema denticola, Bifidobacteriumiongum, Bacteroides thetaiotaomicron, or a Vibrio harveyi ML phageprotein.
 31. The protein of claim 1 wherein the CTL-fold is bacterial ormammalian in origin.
 32. The protein of claim 31 wherein the CTL-fold isa selectin, mannose binding protein, a natural killer receptor, aneosinophilic major basic protein, a tumour necrosis factor-stimulatedgene product, an enteropathogenic E. coli intimin, or a Yersiniapseudotuberculosis invasin.
 33. A library of non-naturally occurring,proteins with variable binding sites wherein each protein comprises aCTL-fold or part thereof; and a binding site which is located within theCTL-fold within the region between the β3 and β5 strands; wherein thebinding sites are displayed and oriented by the CTL-fold.
 34. Thelibrary of claim 33 wherein the proteins are expressed as phagedisplays, ribosome displays, polysome displays, or cell surfacedisplays; or are presented in arrays or microarray formats.
 35. Thelibrary of claim 33 wherein the proteins are in monomeric, dimeric,trimeric, or multimeric form; and/or are attached non-covalently to asecond protein.
 36. The library of claim 33 wherein the proteins areexpressed on the exterior of a phage or bacterium.
 37. The library ofclaim 33 wherein labels are attached to the proteins.
 38. The library ofclaim 33 further comprising cellular toxins or pro-drugs attached to theproteins.